63 research outputs found

    Scalable RDF Data Compression using X10

    Get PDF
    The Semantic Web comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of Semantic Web applications that perform computations over large volumes of information. A typical method for alleviating the impact of this problem is through the use of compression methods that produce more compact representations of the data. The use of dictionary encoding for this purpose is particularly prevalent in Semantic Web database systems. However, centralized implementations present performance bottlenecks, giving rise to the need for scalable, efficient distributed encoding schemes. In this paper, we describe an encoding implementation based on the asynchronous partitioned global address space (APGAS) parallel programming model. We evaluate performance on a cluster of up to 384 cores and datasets of up to 11 billion triples (1.9 TB). Compared to the state-of-art MapReduce algorithm, we demonstrate a speedup of 2.6-7.4x and excellent scalability. These results illustrate the strong potential of the APGAS model for efficient implementation of dictionary encoding and contributes to the engineering of larger scale Semantic Web applications

    Gene expression profiling of gilthead sea bream during early development and detection of stress-related genes by the application of cDNA microarray technology

    Get PDF
    Gene expression profiling of gilthead sea bream during early development and detection of stress-related genes by the application of cDNA microarray technology. Physiol Genomics 23: 182–191, 2005. First published July 26, 2005; doi:10.1152/physiolgenomics.00139.2005.—Large-scale gene expression studies were performed for one of the main European aquaculture species, the gilthead sea bream Sparus auratus L. For this purpose, a cDNA microarray containing 10,176 clones from a cDNA library of mixed embryonic and larval stages was constructed. In addition to its importance for aquaculture, the taxonomic position and the relatively small genome size of sea bream makes it a prospective model for evolutionary biology and comparative genomics. However, so far, no large-scale analysis of gene expression exists for this species. In the present study, gene expression was analyzed in gilthead sea bream during early development, a significant period in the determination of quantitative traits and therefore of considerable interest for aquaculture. Synexpression groups expressed primarily early and late in development were determined and were composed of both known and novel genes. Furthermore, it was possible to identify stress response genes induced by cortisol injections using the cDNA microarray generated. The creation of gene expression profiles for sea bream by microarray hybridization will accelerate identification of candidate genes involved in multifactorial traits and certain regulatory pathways and will also contribute to a better understanding of the genetic background of fish physiology, which may help to improve aquaculture practices.We thank Dr. M. Pankratz and lab for providing the microarray spotting facilities and Dr. C. Seiler for support in generating pictures of the developmental stages of sea bream. Sequences reported in this article have been submitted to the National Center for Biotechnology Information (NCBI) EST database under Accession Nos. CB184056–CB184594 and CV133223–CV133736. Microarray expression data have been submitted to ARRAYExpress under Accession Nos. E-MEXP-181 (experiment) and A-MEXP-110 (array) as well as to the NCBI Omnibus under Accession Nos. GSE 2064 and GSE 1887

    Efficient Parallel Dictionary Encoding for RDF Data.

    Get PDF
    The SemanticWeb comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of SemanticWeb applications that perform computations over large volumes of information. A typical method for alleviating the impact of this problem is through the use of compression methods that produce more compact representations of the data. The use of dictionary encoding for this purpose is particularly prevalent in Semantic Web database systems. However, centralized implementations present performance bottlenecks, giving rise to the need for scalable, efficient distributed encoding schemes. In this paper, we describe a straightforward but very efficient encoding algorithm and evaluate its performance on a cluster of up to 384 cores and datasets of up to 11 billion triples (1.9 TB). Compared to the state-of-art MapReduce algorithm, we demonstrate a speedup of 2:6 - 7:4x and excellent scalability

    Metagenomics : tools and insights for analyzing next-generation sequencing data derived from biodiversity studies

    Get PDF
    Advances in next-generation sequencing (NGS) have allowed significant breakthroughs in microbial ecology studies. This has led to the rapid expansion of research in the field and the establishment of “metagenomics”, often defined as the analysis of DNA from microbial communities in environmental samples without prior need for culturing. Many metagenomics statistical/computational tools and databases have been developed in order to allow the exploitation of the huge influx of data. In this review article, we provide an overview of the sequencing technologies and how they are uniquely suited to various types of metagenomic studies. We focus on the currently available bioinformatics techniques, tools, and methodologies for performing each individual step of a typical metagenomic dataset analysis. We also provide future trends in the field with respect to tools and technologies currently under development. Moreover, we discuss data management, distribution, and integration tools that are capable of performing comparative metagenomic analyses of multiple datasets using well-established databases, as well as commonly used annotation standards

    Efficient Parallel Dictionary Encoding for RDF Data.

    Get PDF
    The SemanticWeb comprises enormous volumes of semi-structured data elements. For interoperability, these elements are represented by long strings. Such representations are not efficient for the purposes of SemanticWeb applications that perform computations over large volumes of information. A typical method for alleviating the impact of this problem is through the use of compression methods that produce more compact representations of the data. The use of dictionary encoding for this purpose is particularly prevalent in Semantic Web database systems. However, centralized implementations present performance bottlenecks, giving rise to the need for scalable, efficient distributed encoding schemes. In this paper, we describe a straightforward but very efficient encoding algorithm and evaluate its performance on a cluster of up to 384 cores and datasets of up to 11 billion triples (1.9 TB). Compared to the state-of-art MapReduce algorithm, we demonstrate a speedup of 2:6 - 7:4x and excellent scalability

    Profiling of infection specific mRNA transcripts of the European seabass Dicentrarchus labrax

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The European seabass (<it>Dicentrarchus labrax</it>), one of the most extensively cultured species in European aquaculture productions, is, along with the gilthead sea bream (<it>Sparus aurata</it>), a prospective model species for the Perciformes which includes several other commercially important species. Massive mortalities may be caused by bacterial or viral infections in intensive aquaculture production. Revealing transcripts involved in immune response and studying their relative expression enhances the understanding of the immune response mechanism and consequently also the creation of vaccines. The analysis of expressed sequence tags (EST) is an efficient and easy approach for gene discovery, comparative genomics and for examining gene expression in specific tissues in a qualitative and quantitative way.</p> <p>Results</p> <p>Here we describe the construction, analysis and comparison of a total of ten cDNA libraries, six from different tissues infected with <it>V. anguillarum </it>(liver, spleen, head kidney, gill, peritoneal exudates and intestine) and four cDNA libraries from different tissues infected with Nodavirus (liver, spleen, head kidney and brain). In total 9605 sequences representing 3075 (32%) unique sequences (set of sequences obtained after clustering) were obtained and analysed. Among the sequences several immune-related proteins were identified for the first time in the order of Perciformes as well as in Teleostei.</p> <p>Conclusion</p> <p>The present study provides new information to the Gene Index of seabass. It gives a unigene set that will make a significant contribution to functional genomic studies and to studies of differential gene expression in relation to the immune system. In addition some of the potentially interesting genes identified by <it>in silico </it>analysis and confirmed by real-time PCR are putative biomarkers for bacterial and viral infections in fish.</p

    Metagenomic investigation of the geologically unique Hellenic Volcanic Arc reveals a distinctive ecosystem with unexpected physiology

    Get PDF
    Hydrothermal vents represent a deep, hot, aphotic biosphere where chemosynthetic primary producers, fuelled by chemicals from Earth\u27s subsurface, form the basis of life. In this study, we examined microbial mats from two distinct volcanic sites within the Hellenic Volcanic Arc (HVA). The HVA is geologically and ecologically unique, with reported emissions of CO2‐saturated fluids at temperatures up to 220°C and a notable absence of macrofauna. Metagenomic data reveals highly complex prokaryotic communities composed of chemolithoautotrophs, some methanotrophs, and to our surprise, heterotrophs capable of anaerobic degradation of aromatic hydrocarbons. Our data suggest that aromatic hydrocarbons may indeed be a significant source of carbon in these sites, and instigate additional research into the nature and origin of these compounds in the HVA. Novel physiology was assigned to several uncultured prokaryotic lineages; most notably, a SAR406 representative is attributed with a role in anaerobic hydrocarbon degradation. This dataset, the largest to date from submarine volcanic ecosystems, constitutes a significant resource of novel genes and pathways with potential biotechnological applications

    Geochemistry of CO2-Rich Gases Venting From Submarine Volcanism: The Case of Kolumbo (Hellenic Volcanic Arc, Greece)

    Get PDF
    Studies of submarine hydrothermal systems in Mediterranean Sea are limited to the southern Italian volcanism, while are totally missing in the Aegean. Here, we report on the geochemistry of high-temperature fluids (up to 220°C) venting at 500 m b.s.l. from the floor of Kolumbo submarine volcano (Hellenic Volcanic Arc, Greece), which is located 7 km northeast of Santorini Island. Despite the recent unrest at Santorini, Kolumbo submarine volcano is considered more active due to a higher seismicity. Rizzo et al. (2016) investigated the He-isotope composition of gases collected from seven chimneys and showed that are dominated by CO2 (&gt;97%), with only a small air contamination. Here we provide more-complete chemical data and isotopic compositions of CO2 and CH4, and Hg(0) concentration. We show that the gases emitted from different vents are fractionated by the partial dissolution of CO2 in water. Fractionation is also evident in the C-isotope composition (ÎŽ13CCO2), which varies between -0.04 and 1.15‰. We modeled this process to reconstruct the chemistry and ÎŽ13CCO2 of intact magmatic gases before fractionation. We argue that the CO2 prior to CO2 dissolution in water had ÎŽ13C ∌-0.4‰ and CO2/3He ∌1 × 1010. This model reveals that the gases emitted from Kolumbo originate from a homogeneous mantle contaminated with CO2, probably due to decarbonation of subducting limestone, which is similar to other Mediterranean arc volcanoes (e.g., Stromboli, Italy). The isotopic signature of CH4 (ÎŽ13C ∌-18‰ and ÎŽD ∌-117‰) is within a range of values typically observed for hydrothermal gases (e.g., Panarea and Campi Flegrei, Italy), which is suggestive of mixing between thermogenic and abiotic CH4. We report that the concentrations of Hg(0) in Kolumbo fluids are particularly high (∌61 to 1300 ng m-3) when compared to land-based fumaroles located on Santorini and worldwide aerial volcanic emissions. This finding may represent further evidence for the high level of magmatic activity at Kolumbo. Based on the geo-indicators of temperature and pressure, we calculate that the magmatic gases equilibrate within the Kolumbo hydrothermal system at about 270°C and at a depth of ∌1 km b.s.l

    A marine biodiversity observation network for genetic monitoring of hard-bottom communities (ARMS-MBON)

    Get PDF
    Marine hard-bottom communities are undergoing severe change under the influence of multiple drivers, notably climate change, extraction of natural resources, pollution and eutrophication, habitat degradation, and invasive species. Monitoring marine biodiversity in such habitats is, however, challenging as it typically involves expensive, non-standardized, and often destructive sampling methods that limit its scalability. Differences in monitoring approaches furthermore hinders inter-comparison among monitoring programs. Here, we announce a Marine Biodiversity Observation Network (MBON) consisting of Autonomous Reef Monitoring Structures (ARMS) with the aim to assess the status and changes in benthic fauna with genomic-based methods, notably DNA metabarcoding, in combination with image-based identifications. This article presents the results of a 30-month pilot phase in which we established an operational and geographically expansive ARMS-MBON. The network currently consists of 20 observatories distributed across European coastal waters and the polar regions, in which 134 ARMS have been deployed to date. Sampling takes place annually, either as short-term deployments during the summer or as long-term deployments starting in spring. The pilot phase was used to establish a common set of standards for field sampling, genetic analysis, data management, and legal compliance, which are presented here. We also tested the potential of ARMS for combining genetic and image-based identification methods in comparative studies of benthic diversity, as well as for detecting non-indigenous species. Results show that ARMS are suitable for monitoring hard-bottom environments as they provide genetic data that can be continuously enriched, re-analyzed, and integrated with conventional data to document benthic community composition and detect non-indigenous species. Finally, we provide guidelines to expand the network and present a sustainability plan as part of the European Marine Biological Resource Centre (www.embrc.eu).Peer reviewe
    • 

    corecore